Skip to content

Conversation

@Sahilbhatane
Copy link
Collaborator

@Sahilbhatane Sahilbhatane commented Nov 14, 2025

Summary

  • add native Kimi K2 and fake providers to the command interpreter and CLI

Testing

  • C:/Python314/python.exe test\run_all_tests.py

this pull request has test file which was need for issue #40 but it fixed #16 because of that as well.

Summary by CodeRabbit

  • New Features

    • Multi-provider LLM support (OpenAI, Claude, Kimi, FAKE), provider-aware API key handling, configurable Kimi endpoint, deterministic fake mode, default-model selection, contextual parsing, and dangerous-command filtering
  • Refactor

    • Major overhaul of user preferences with validation, serialization, import/export, and manager utilities
  • Documentation

    • Added contributor guide, testing strategy, and Kimi integration docs
  • Tests

    • New unit tests and Docker-backed end-to-end integration tests; expanded provider coverage
  • Chores

    • Broadened Python support (>=3.8), updated packaging metadata, added requests and PyYAML deps

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 14, 2025

Warning

Rate limit exceeded

@Sahilbhatane has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 11 minutes and 29 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between ad1080d and dc87e2c.

📒 Files selected for processing (3)
  • cortex/cli.py (2 hunks)
  • requirements.txt (1 hunks)
  • test/test_cli.py (2 hunks)

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds multi-provider NL→shell translation (OpenAI, Anthropic, Kimi K2, FAKE), provider-aware CLI API-key selection and fake-mode handling, a reworked preferences manager with serialization, Docker-backed integration test utilities and end-to-end tests, and updates to packaging, requirements, and docs.

Changes

Cohort / File(s) Summary
LLM Multi-Provider Implementation
LLM/interpreter.py, LLM/test_interpreter.py, LLM/requirements.txt
Adds APIProvider.KIMI and APIProvider.FAKE; provider dispatch and handlers (_call_kimi, _call_fake); _default_model resolver; parsing (_parse_commands, parse_with_context) and validation (_validate_commands); Kimi HTTP integration and deterministic FAKE path. Tests expanded for Kimi unit/integration scenarios.
CLI Provider-Aware Enhancements
cortex/cli.py, test/test_cli.py
Adds provider override (CORTEX_PROVIDER), provider-specific _get_api_key(provider), KIMI_API_KEY handling, fake-mode env support (CORTEX_FAKE_COMMANDS/fake API key), and provider-aware install flows. Tests updated for explicit provider key resolution and fake-provider scenarios.
User Preferences System Refactoring
cortex/user_preferences.py
Replaces lightweight YAML dataclasses with richer classes/enums and ConflictSettings; adds to_dict/from_dict, PreferencesManager (load/save/get/set/reset/validate/export/import/list), backup handling, coercion helpers, stricter semantics, and changed return types.
Docker-Based Integration Testing
test/integration/*, test/test_coordinator.py
Adds DockerRunResult, docker_available(), run_in_docker() utilities (test/integration/docker_utils.py), package init (test/integration/__init__.py), and Docker-backed end-to-end tests (test/integration/test_end_to_end.py). Removes some plan-specific coordinator tests.
Project Configuration & Dependencies
setup.py, requirements.txt, requirements-dev.txt, LLM/requirements.txt
Lowers python_requires to >=3.8, adds 3.8/3.9 classifiers, changes license classifier to MIT, and adds runtime/dev deps: requests>=2.32.4 and PyYAML>=6.0.0.
Documentation & Guides
contribution.md, test.md, docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md
Adds contributor guide, testing guide with Docker/integration instructions, and Kimi K2 integration design doc covering API, CLI, testing, and migration notes.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as Cortex CLI
  participant Interpreter as CommandInterpreter
  participant SDK as OpenAI/Anthropic SDKs
  participant KimiAPI as Kimi HTTP API

  CLI->>Interpreter: NL→shell request (provider selected)
  alt provider == kimi
    Interpreter->>KimiAPI: POST structured payload (system+user+model, headers)
    KimiAPI-->>Interpreter: assistant response / error
    Interpreter->>Interpreter: parse_with_context → _parse_commands → _validate_commands
    Interpreter-->>CLI: commands or error
  else provider == openai/anthropic
    Interpreter->>SDK: SDK call (model, system prompt)
    SDK-->>Interpreter: assistant response
    Interpreter->>Interpreter: parse_with_context → _parse_commands → _validate_commands
    Interpreter-->>CLI: commands or error
  else provider == fake
    Interpreter->>Interpreter: load `CORTEX_FAKE_COMMANDS` or env/default
    Interpreter-->>CLI: deterministic commands
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50–70 minutes

  • Focus areas:
    • LLM/interpreter.py: provider dispatch, Kimi HTTP payload/response handling, error paths, parsing and safety validation.
    • cortex/user_preferences.py: serialization/deserialization, persistence semantics, API surface changes (load/save/set/reset) and error behavior.
    • test/integration/docker_utils.py & test/integration/test_end_to_end.py: docker command construction, environment/mount propagation, timeouts and reliability.
    • cortex/cli.py & test/test_cli.py: provider-aware API key mapping and fake-mode flows.
    • setup.py & requirements files: dependency and packaging metadata changes.

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • dhvll
  • mikejmorgan-ai

Poem

🐰 I hopped through code with whiskers bright,

Kimi replied and fake commands took flight,
Docker drums kept tests alight,
Preferences saved all through the night,
Hop—commit! —the rabbit’s delight.

Pre-merge checks and finishing touches

❌ Failed checks (3 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete. It lacks the required 'Related Issue' section with proper issue closure syntax, only mentions testing invocation without detail, and the claim about 'also fixed #16' is not substantiated in the description itself. The description does not follow the provided template structure. Add 'Closes #40' and 'Closes #16' in a 'Related Issue' section, expand the 'Summary' with comprehensive details of changes across all modified files, and ensure the description matches the provided template format.
Out of Scope Changes check ⚠️ Warning Multiple significant changes appear out of scope relative to linked issue #16's focus on end-to-end integration testing. These include: cortex/user_preferences.py (complete preferences system refactoring), setup.py modifications (Python version requirements, license changes), requirements files updates, and CLI provider-specific key handling enhancements. While Kimi provider support enables testing, the broader refactoring and infrastructure changes are not required by issue #16. Separate out-of-scope refactoring (user_preferences.py, setup.py, license changes) into a dedicated PR; retain only essential changes for issue #16 (Docker tests, interpreter/CLI updates necessary for testing scenarios). Consider splitting into focused PRs per the reviewer's feedback.
Docstring Coverage ⚠️ Warning Docstring coverage is 42.96% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Linked Issues check ❓ Inconclusive The PR partially addresses linked issue #16 (end-to-end integration tests). New Docker-backed integration tests, TestEndToEndWorkflows class, and docker_utils module are present. However, the checklist requirements are not fully met: only 4 end-to-end tests are implemented (vs. 5+ required), CI/CD automation is not included in the PR, and documentation coverage is incomplete for the integration testing setup. Verify that all 5+ required end-to-end tests are implemented, confirm CI/CD integration for automated test execution is configured, ensure comprehensive documentation for Docker-based testing is present, and explicitly validate the 'happy path' coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Added Kimi K2 provider tests Fixes #40' is partially related to the changeset. It focuses on Kimi provider tests and issue #40, but the PR actually implements multi-provider support (Kimi and FAKE) across interpreter, CLI, and testing with significant infrastructure changes (Docker integration tests, preferences refactoring, setup.py updates) that go beyond just 'provider tests.'

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Sahilbhatane Sahilbhatane requested a review from dhvll November 14, 2025 16:40
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (20)
cortex/__init__.py (1)

1-2: Avoid duplicating the version string in multiple places.

__version__ = "0.1.0" here and version="0.1.0" in setup.py will drift over time. Consider defining the version in a single module (e.g., cortex/_version.py) and importing it both here and in setup.py.

setup.py (1)

7-8: Make requirements parsing robust to indented comments.

Current list comprehension:

requirements = [line.strip() for line in fh if line.strip() and not line.startswith("#")]

will include lines like " # comment" as a requirement because line.startswith("#") ignores leading whitespace. Consider:

stripped = line.strip()
if stripped and not stripped.startswith("#"):
    requirements.append(stripped)

to avoid accidentally treating indented comments as dependencies.

cortex/coordinator.py (5)

12-39: Use explicit None checks in duration for clarity.

To avoid relying on truthiness of floats and to be explicit about the sentinel value, consider:

def duration(self) -> Optional[float]:
    if self.start_time is not None and self.end_time is not None:
        return self.end_time - self.start_time
    return None

This is a minor readability/robustness improvement.


87-97: Avoid fully silent failures in _log.

Right now any error opening/writing the log file is swallowed:

try:
    ...
except Exception:
    pass

Consider at least emitting a minimal print(..., file=sys.stderr) or narrowing the exception type so that unexpected errors don’t fail silently. This keeps installation resilient while still giving a hint when logging is misconfigured.


99-141: Be explicit about shell=True and broad exception handling trade-offs.

Given these commands are arbitrary shell lines, shell=True is probably required, but:

  • It’s a security footgun if ever exposed to untrusted input or run under elevated privileges.
  • Catching bare Exception here may hide unexpected programmer errors.

At minimum, consider documenting that InstallationCoordinator is intended for local, trusted environments and not for executing untrusted commands. If feasible, narrow the except Exception to specific failures (e.g., OSError) so unexpected errors surface during development.


166-197: Progress callback is invoked before status is updated, so callers never see final state.

progress_callback is called before _execute_command, while step.status is still PENDING. The CLI’s callback uses step.status to print [OK]/[FAIL], so it will only ever see [PENDING].

Consider instead:

for i, step in enumerate(self.steps):
    if self.progress_callback:
        self.progress_callback(i + 1, len(self.steps), step)  # optional “starting” hook

    success = self._execute_command(step)

    if self.progress_callback:
        self.progress_callback(i + 1, len(self.steps), step)  # “completed” hook with final status

or move the callback to after _execute_command if you only want the completed status. This will make CLI progress output reflect real step outcomes.


269-307: install_docker is tightly coupled to Debian/Ubuntu + systemd environments.

The hard-coded apt commands, add-apt-repository, and systemctl checks make this helper effectively Ubuntu/Debian with systemd only. That’s fine for container-based tests, but if you plan to reuse it on other distros (e.g., Arch, Fedora, non-systemd setups), it may fail in surprising ways.

Consider either documenting this assumption clearly or wiring distro detection into the higher-level CLI so alternate installers can be plugged in later.

test/run_all_tests.py (1)

11-33: Test discovery correctly scoped to test/ but won’t pick tests outside

discover_tests() starts from os.path.dirname(__file__), so only tests under the test directory are discovered, which matches how cortex/cli.py locates run_all_tests.py. If you ever want LLM/test_interpreter.py (or other top-level tests) to be run via this entry point, you’ll need to broaden start_dir (e.g., repo root) or mirror those tests under test/.

README.md (1)

93-95: Contribution and testing references are helpful—verify link targets

Pointing contributors to Issues, contribution.md, and test.md is good for onboarding and testing discipline. Just double-check that these relative links resolve correctly in the GitHub UI from this README.

LLM/test_interpreter.py (1)

97-103: Remove redundant @patch('openai.OpenAI') to satisfy Ruff and simplify the test

In test_parse_empty_input, the mock_openai argument is unused, and setUp already stubs the openai module globally. You can safely drop the patch decorator and argument—this will also resolve Ruff’s ARG002 warning.

-    @patch('openai.OpenAI')
-    def test_parse_empty_input(self, mock_openai):
+    def test_parse_empty_input(self):
         interpreter = CommandInterpreter(api_key=self.api_key, provider="openai")
 
         with self.assertRaises(ValueError):
             interpreter.parse("")
test/test_cli.py (2)

32-37: Strengthen test_get_api_key_not_found by asserting error output

You already patch sys.stderr here, but the test only checks that the API key is None, and mock_stderr is otherwise unused (Ruff ARG002).

To fully exercise _get_api_key’s error path and use the mock, consider also asserting that something was written to stderr, e.g.:

@patch.dict(os.environ, {}, clear=True)
@patch('sys.stderr')
def test_get_api_key_not_found(self, mock_stderr) -> None:
-    api_key = self.cli._get_api_key('openai')
-    self.assertIsNone(api_key)
+    api_key = self.cli._get_api_key('openai')
+    self.assertIsNone(api_key)
+    self.assertTrue(mock_stderr.write.called)

This both validates the user-visible error and clears the static-analysis warning.


90-196: Avoid real spinner sleeps in install-path unit tests (optional)

All install tests that reach CortexCLI.install with a valid API key will run the 10‑iteration spinner loop, each iteration sleeping for 0.1s. As the number of tests grows, this can noticeably slow the suite.

If runtime becomes an issue, consider patching _animate_spinner for these tests:

@patch.object(CortexCLI, "_animate_spinner", return_value=None)
@patch.dict(os.environ, {'OPENAI_API_KEY': 'test-key'}, clear=True)
@patch('cortex.cli.CommandInterpreter')
def test_install_dry_run(self, mock_interpreter_class, _mock_spinner) -> None:
    ...

This keeps behaviour identical from the CLI’s perspective while making tests faster and less timing‑sensitive.

cortex/test_coordinator.py (1)

36-321: Coordinator tests look comprehensive; consider making test_step_timing more robust

The coordinator test suite here is thorough and matches the coordinator’s behaviour well (init, stop/continue on error, rollback, logging, verification, summaries, and JSON export).

One minor robustness point: in test_step_timing, you assert step.end_time > step.start_time. On platforms with coarse time.time() resolution, two consecutive calls can occasionally return the same value, which would make this test intermittently fail even though the coordinator is behaving correctly.

If you want to eliminate that flakiness risk, you could either:

  • Relax to >= or
  • Patch time.time() to return controlled, strictly increasing values during the test.

Otherwise, this module looks solid.

LLM/interpreter.py (1)

154-185: Minor cleanup in _call_fake: avoid double JSON parsing of CORTEX_FAKE_COMMANDS

The fake provider path correctly validates that CORTEX_FAKE_COMMANDS is JSON with a top‑level "commands" key, but then discards the parsed data and calls _parse_commands(payload), which re‑parses the same JSON string.

Functionally this is fine, but you can simplify and avoid double parsing by reusing data, for example:

        payload = os.environ.get("CORTEX_FAKE_COMMANDS")
        if payload:
            try:
-                data = json.loads(payload)
+                data = json.loads(payload)
             except json.JSONDecodeError as exc:
                 raise ValueError("CORTEX_FAKE_COMMANDS must contain valid JSON") from exc
-            if not isinstance(data, dict) or "commands" not in data:
-                raise ValueError("CORTEX_FAKE_COMMANDS must define a 'commands' list")
-            return self._parse_commands(payload)
+            if not isinstance(data, dict) or "commands" not in data:
+                raise ValueError("CORTEX_FAKE_COMMANDS must define a 'commands' list")
+            commands = data["commands"]
+            if not isinstance(commands, list):
+                raise ValueError("CORTEX_FAKE_COMMANDS 'commands' must be a list")
+            return [cmd for cmd in commands if cmd and isinstance(cmd, str)]

parse() will still run _validate_commands on this list, so you keep the safety guarantees while reducing redundant work.

test/integration/docker_utils.py (1)

55-108: Mounted paths are documented as read‑only but are actually read‑write

In run_in_docker() the mounts parameter is documented as:

“Iterable of host Path instances mounted read-only to the same location within the container.”

However, the volume flag is built as:

docker_cmd.extend([
    "-v",
    f"{str(host_path.resolve())}:{container_path}",
])

Without a :ro suffix, these mounts are read‑write by default. For integration tests this may be intentional (e.g., writing artifacts back to the host), but it contradicts the docstring and slightly weakens isolation.

Consider either:

  • Appending :ro to the volume spec to enforce read‑only mounts, or
  • Updating the docstring to reflect that mounts are read‑write and, if needed, adding an option to request read‑only behaviour.
cortex/cli.py (5)

22-36: Provider fallback could cause confusing error messages.

Line 36 returns 'openai' as the default even when no OPENAI_API_KEY is present. This will cause a delayed error in _get_api_key rather than failing fast with a clear message that no provider is configured.

Consider one of these approaches:

Option 1: Return None and handle in caller

         return 'fake'
-    return 'openai'
+    return None  # No provider configured

Then check in the install method:

provider = self._get_provider()
if provider is None:
    self._print_error("No LLM provider configured. Set OPENAI_API_KEY, ANTHROPIC_API_KEY, or KIMI_API_KEY.")
    return 1

Option 2: Emit clear error immediately

         return 'fake'
-    return 'openai'
+    self._print_error("No LLM provider configured. Set one of: OPENAI_API_KEY, ANTHROPIC_API_KEY, or KIMI_API_KEY.")
+    sys.exit(1)

92-108: Consider removing redundant "install" prefix.

Line 103 prefixes the software parameter with "install " when calling the interpreter. Since the CLI command is already install and the parameter is named software, this might be redundant or confusing. Users might naturally include verbs in their input (e.g., cortex install "set up docker").

Consider passing the software parameter directly:

-            commands = interpreter.parse(f"install {software}")
+            commands = interpreter.parse(software)

Or make the prompt more explicit:

-            commands = interpreter.parse(f"install {software}")
+            commands = interpreter.parse(f"Generate shell commands to install {software}")

118-156: Execute mode logic is sound, but consider adding else block.

The execution flow with progress callbacks and error handling is well-structured. However, Lines 153-155 should be in an else block for clarity, as flagged by static analysis (TRY300).

Apply this diff to improve structure:

                     print(f"  Error: {result.error_message}", file=sys.stderr)
                 return 1
+            else:
+                print("\nTo execute these commands, run with --execute flag")
+                print("Example: cortex install docker --execute")
+                return 0
 
-            print("\nTo execute these commands, run with --execute flag")
-            print("Example: cortex install docker --execute")
-            return 0
-
         except ValueError as exc:

157-166: Exception handling is appropriate for CLI entry point.

The tiered exception handling (ValueError for input errors, RuntimeError for API failures, and Exception as fallback) is suitable for a CLI entry point where graceful error messages are important. The blind Exception catch on Line 163 prevents unexpected tracebacks from reaching end users, which is acceptable in this context.

Static analysis suggests using explicit conversion flags in f-strings (RUF010):

-            self._print_error(f"API call failed: {str(exc)}")
+            self._print_error(f"API call failed: {exc!s}")
-            self._print_error(f"Unexpected error: {str(exc)}")
+            self._print_error(f"Unexpected error: {exc!s}")

These are minor style improvements and not critical.


197-198: Consider making --execute and --dry-run mutually exclusive.

Both flags can currently be specified together, and dry_run takes precedence (checked first on Line 114). This could confuse users who expect both flags to trigger an error.

Make the flags mutually exclusive:

     install_parser = subparsers.add_parser('install', help='Install software using natural language')
     install_parser.add_argument('software', type=str, help='Software to install (natural language)')
-    install_parser.add_argument('--execute', action='store_true', help='Execute the generated commands')
-    install_parser.add_argument('--dry-run', action='store_true', help='Show commands without executing')
+    
+    exec_group = install_parser.add_mutually_exclusive_group()
+    exec_group.add_argument('--execute', action='store_true', help='Execute the generated commands')
+    exec_group.add_argument('--dry-run', action='store_true', help='Show commands without executing')
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 90236f7 and fe37193.

📒 Files selected for processing (19)
  • .gitignore (1 hunks)
  • LLM/interpreter.py (9 hunks)
  • LLM/requirements.txt (1 hunks)
  • LLM/test_interpreter.py (11 hunks)
  • MANIFEST.in (1 hunks)
  • README.md (2 hunks)
  • contribution.md (1 hunks)
  • cortex/__init__.py (1 hunks)
  • cortex/cli.py (1 hunks)
  • cortex/coordinator.py (1 hunks)
  • cortex/test_cli.py (1 hunks)
  • cortex/test_coordinator.py (1 hunks)
  • setup.py (1 hunks)
  • test.md (1 hunks)
  • test/integration/__init__.py (1 hunks)
  • test/integration/docker_utils.py (1 hunks)
  • test/integration/test_end_to_end.py (1 hunks)
  • test/run_all_tests.py (1 hunks)
  • test/test_cli.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (9)
test/run_all_tests.py (1)
cortex/cli.py (1)
  • main (168-222)
cortex/cli.py (3)
LLM/interpreter.py (2)
  • CommandInterpreter (18-268)
  • parse (224-243)
cortex/coordinator.py (3)
  • InstallationCoordinator (53-266)
  • StepStatus (12-19)
  • execute (166-213)
test/run_all_tests.py (1)
  • main (19-33)
LLM/test_interpreter.py (1)
LLM/interpreter.py (4)
  • CommandInterpreter (18-268)
  • APIProvider (9-15)
  • parse (224-243)
  • _call_kimi (118-152)
test/integration/test_end_to_end.py (3)
test/integration/docker_utils.py (3)
  • docker_available (26-52)
  • run_in_docker (55-108)
  • succeeded (21-23)
cortex/cli.py (1)
  • main (168-222)
test/run_all_tests.py (1)
  • main (19-33)
cortex/test_coordinator.py (1)
cortex/coordinator.py (11)
  • InstallationCoordinator (53-266)
  • InstallationStep (23-39)
  • InstallationResult (43-50)
  • StepStatus (12-19)
  • install_docker (269-307)
  • duration (35-39)
  • execute (166-213)
  • add_rollback_command (162-164)
  • verify_installation (215-237)
  • get_summary (239-261)
  • export_log (263-266)
cortex/test_cli.py (1)
cortex/cli.py (9)
  • CortexCLI (14-165)
  • main (168-222)
  • _get_api_key (38-54)
  • _get_provider (22-36)
  • _print_status (56-58)
  • _print_error (60-62)
  • _print_success (64-66)
  • install (80-165)
  • _animate_spinner (68-73)
cortex/coordinator.py (1)
cortex/cli.py (1)
  • progress_callback (119-126)
cortex/__init__.py (1)
cortex/cli.py (1)
  • main (168-222)
test/test_cli.py (1)
cortex/cli.py (9)
  • CortexCLI (14-165)
  • main (168-222)
  • _get_api_key (38-54)
  • _get_provider (22-36)
  • _print_status (56-58)
  • _print_error (60-62)
  • _print_success (64-66)
  • install (80-165)
  • _animate_spinner (68-73)
🪛 Ruff (0.14.4)
test/integration/docker_utils.py

34-34: subprocess call: check for execution of untrusted input

(S603)


42-42: subprocess call: check for execution of untrusted input

(S603)


50-50: Consider moving this statement to an else block

(TRY300)


91-91: Use explicit conversion flag

Replace with conversion flag

(RUF010)


99-99: subprocess call: check for execution of untrusted input

(S603)

cortex/cli.py

155-155: Consider moving this statement to an else block

(TRY300)


161-161: Use explicit conversion flag

Replace with conversion flag

(RUF010)


163-163: Do not catch blind exception: Exception

(BLE001)


164-164: Use explicit conversion flag

Replace with conversion flag

(RUF010)


210-210: subprocess call: check for execution of untrusted input

(S603)

LLM/test_interpreter.py

98-98: Unused method argument: mock_openai

(ARG002)

cortex/test_cli.py

28-28: Unused method argument: mock_stderr

(ARG002)


48-48: Unused method argument: mock_stderr

(ARG002)


53-53: Unused method argument: mock_stdout

(ARG002)

cortex/coordinator.py

75-75: Avoid specifying long messages outside the exception class

(TRY003)


96-97: try-except-pass detected, consider logging the exception

(S110)


96-96: Do not catch blind exception: Exception

(BLE001)


107-107: subprocess call with shell=True identified, security issue

(S602)


136-136: Do not catch blind exception: Exception

(BLE001)


140-140: Use explicit conversion flag

Replace with conversion flag

(RUF010)


153-153: subprocess call with shell=True identified, security issue

(S602)


159-159: Do not catch blind exception: Exception

(BLE001)


160-160: Use explicit conversion flag

Replace with conversion flag

(RUF010)


223-223: subprocess call with shell=True identified, security issue

(S602)


233-233: Do not catch blind exception: Exception

(BLE001)


235-235: Use explicit conversion flag

Replace with conversion flag

(RUF010)

LLM/interpreter.py

55-55: Avoid specifying long messages outside the exception class

(TRY003)


146-146: Abstract raise to an inner function

(TRY301)


146-146: Avoid specifying long messages outside the exception class

(TRY003)


149-149: Abstract raise to an inner function

(TRY301)


149-149: Avoid specifying long messages outside the exception class

(TRY003)


152-152: Avoid specifying long messages outside the exception class

(TRY003)


152-152: Use explicit conversion flag

Replace with conversion flag

(RUF010)


162-162: Avoid specifying long messages outside the exception class

(TRY003)


164-164: Avoid specifying long messages outside the exception class

(TRY003)


227-227: Avoid specifying long messages outside the exception class

(TRY003)

test/test_cli.py

34-34: Unused method argument: mock_stderr

(ARG002)

🔇 Additional comments (28)
.gitignore (1)

1-30: Solid foundation for Python project hygiene.

The .gitignore file appropriately covers Python bytecode, build artifacts, virtual environments, and testing/coverage outputs—all essential for the multi-provider LLM CLI and Docker-backed integration test infrastructure this PR introduces. Patterns are syntactically correct and follow standard conventions.

MANIFEST.in (1)

1-5: Packaging manifest looks appropriate for the current project layout.

README, LICENSE, Python sources, and LLM requirements are included; this is a reasonable minimal set for distribution.

contribution.md (1)

12-20: Confirm src/requirements.txt path and keep install docs aligned.

The setup instructions depend on src/requirements.txt. Please double-check that this file exists in the repo and that other docs (e.g., test.md, README) reference the same path so new contributors don’t hit a missing-file error.

test.md (1)

1-67: Testing guide is clear and aligns with the described test suites.

Unit vs integration responsibilities, Docker usage, and CI recommendations are well scoped and actionable; this should make it straightforward for contributors to run the right tests locally and in CI.

test/integration/__init__.py (1)

1-1: Clear package-level docstring

The docstring succinctly documents the purpose of the integration test package; no changes needed.

README.md (1)

61-76: Provider table and AI layer docs align with implementation

The updated AI stack line and the “Supported LLM Providers” table accurately reflect the current providers and defaults (OPENAI_API_KEY, ANTHROPIC_API_KEY, KIMI_API_KEY, CORTEX_PROVIDER=fake with CORTEX_FAKE_COMMANDS). This should make configuring OpenAI/Claude/Kimi/Fake much clearer for users.

LLM/test_interpreter.py (7)

2-10: Test module setup is pragmatic and isolates interpreter dependencies

Using sys.path.insert plus sys.modules stubs for openai and anthropic is a practical way to exercise CommandInterpreter without requiring real SDKs. This keeps tests self-contained and avoids network calls while still letting you assert initialization behavior.


15-24: Shared stubs in setUp keep provider imports safe across tests

The setUp method’s patch.dict(sys.modules, {"openai": ..., "anthropic": ...}) ensures that every test uses stubs instead of real clients, which nicely centralizes mocking and avoids per-test boilerplate for import safety.


49-55: Kimi initialization test cleanly validates defaults and dependency handling

The test_initialization_kimi case, with @patch.dict(os.environ, {}, clear=True) and a stubbed requests module, correctly validates that the interpreter:

  • Selects APIProvider.KIMI
  • Falls back to the kimi-k2 default model when KIMI_DEFAULT_MODEL is unset
  • Requires requests to be importable

This is a solid pattern for providers that use plain HTTP clients rather than SDKs.


104-199: OpenAI/Claude call and parsing tests comprehensively cover core flows

The tests around _call_openai, _call_claude, parse, and parse_with_context (including success and failure branches, validation on/off, and context injection) exercise the main provider paths and the dangerous-command filtering well. They should catch most regressions in prompt formatting, JSON parsing, and validation logic.


221-242: Docker installation parsing test validates realistic multi-command output

test_parse_docker_installation ensures that the interpreter returns a non-empty list and that at least one command mentions docker, using a realistic multi-command JSON payload. This is a good high-level sanity check on both parsing and basic intent preservation.


243-254: Fake provider tests nicely pin env-driven and default behaviors

The FAKE provider tests:

  • Confirm that CORTEX_FAKE_COMMANDS JSON is honored exactly when set.
  • Fall back to the safe default docker command sequence when the env var is absent.

This gives you deterministic offline behavior and protects against regressions in the fake provider path.


255-284: Kimi HTTP tests correctly isolate requests and exercise success/error paths

The Kimi tests stub requests via sys.modules, then:

  • Validate happy-path JSON handling and command extraction in _call_kimi.
  • Ensure any underlying exception from post is surfaced as a RuntimeError.

This pattern cleanly decouples tests from the real HTTP client and should make future refactors to Kimi integration safer.

test/integration/test_end_to_end.py (8)

1-21: LGTM! Well-structured test module setup.

The module docstring, imports, and constants are well-organized. The Docker image selection via environment variable provides good flexibility, and the base environment variables are appropriate for containerized testing.


23-37: LGTM! Clean helper method for Docker execution.

The test class properly skips when Docker is unavailable, and the _run helper method correctly merges environments and prepends pip bootstrapping. The use of dict | None syntax with from __future__ import annotations ensures compatibility.


39-44: LGTM! Effective smoke test.

This test provides a good baseline check that the CLI entry point is functional in a clean container environment.


46-64: LGTM! Comprehensive dry-run test.

The test properly configures the fake provider and validates that dry-run mode displays generated commands without executing them. The JSON structure correctly matches the expected format.


65-82: LGTM! Good coverage of execution flow.

This test validates the execute path without requiring real API calls or executing potentially dangerous commands. The assertion checks for the expected success message.


83-99: LGTM! Good component-level test.

The test properly isolates the coordinator component and uses secure heredoc syntax to prevent variable expansion. This validates the coordinator works correctly inside the container environment.


113-114: LGTM! Standard unittest main block.

The main block follows standard Python testing patterns and is appropriately marked for coverage exclusion.


100-111: Assertion is valid and test is well-structured.

The unittest runner correctly outputs "OK" in the last line of stdout, as verified. The assertion self.assertIn("OK", result.stdout.splitlines()[-1]) is sound and will reliably detect successful test runs even when additional details like skipped test counts are present.

cortex/cli.py (7)

1-12: LGTM! Clean module setup.

The imports and module docstring are well-organized and appropriate for the CLI implementation.


14-21: LGTM! Clean initialization.

The spinner initialization is straightforward and uses appropriate Unicode braille patterns for visual feedback.


38-54: LGTM! Clean API key retrieval with helpful error messages.

The method properly maps providers to environment variables and provides clear guidance when keys are missing.


56-79: LGTM! Well-designed output helpers.

The helper methods provide clean abstractions for console output with appropriate use of stdout/stderr and ANSI escape sequences.


109-117: LGTM! Clear command display and dry-run handling.

The generated commands are displayed in a user-friendly numbered format, and dry-run mode exits appropriately with helpful guidance.


202-211: LGTM! Test runner integration is secure.

The test flag handling correctly constructs the path to the test runner and executes it using the current Python interpreter. The static analysis warning on Line 210 (S603) is a false positive—the test_runner path is derived from __file__, not user input.


213-226: LGTM! Clean command dispatch and entry point.

The command dispatch logic is straightforward, and the main block follows standard Python patterns.

@dhvll
Copy link
Collaborator

dhvll commented Nov 14, 2025

@Sahilbhatane why are still solving multiple issues in single PR. Limit one issue for only one PR.
And give some kind of demonstration that its working video is the best.
Too much AI slop is not good. You can use AI but understand the code and keep the files structured.
Do self review before raising PRs.

@dhvll dhvll closed this Nov 14, 2025
@Sahilbhatane
Copy link
Collaborator Author

hey @dhvll here is the video you asked -
https://www.loom.com/share/338206862fd94b1db8178dc21366471f

i will also share this on you discord. this video show CLI implantation and kimi k2 provider implementation as well. other details, reviews and conversation we will do on discord

@mikejmorgan-ai
Copy link
Member

🎯 Closing for MVP Focus

This issue is being closed to help the team focus on MVP-critical features (#1-45).

This is NOT abandoned - it's an important feature we'll revisit after MVP completion.

Timeline:

Want to work on this anyway?
Comment below and we can discuss! We're always open to great contributions.

Tracking: Labeled as post-mvp for easy filtering when we reopen.

Thanks for understanding! 🚀

— Mike (@mikejmorgan-ai)

@Sahilbhatane Sahilbhatane reopened this Dec 2, 2025
@Sahilbhatane Sahilbhatane changed the title Added Kimi K2 provider and Docker integration tests Fixes #40 and #16(optional) Added Kimi K2 provider tests Fixes #40 Dec 2, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
LLM/interpreter.py (1)

262-271: Default model identifiers have been updated appropriately.

The defaults have been updated from the previous review:

  • OpenAI: "gpt-4o" (updated from deprecated "gpt-4")
  • Kimi: "kimi-k2-turbo-preview" with env override support

This addresses the concerns from the previous review about deprecated model identifiers.

🧹 Nitpick comments (11)
test/test_coordinator.py (2)

5-5: Unused import.

The time module is imported but doesn't appear to be used in this file.

-import time

319-319: Prefer assertGreater over assertTrue with comparison.

As flagged by SonarCloud, assertGreater provides better failure messages and is the idiomatic choice for this comparison.

-            self.assertTrue(step.end_time > step.start_time)
+            self.assertGreater(step.end_time, step.start_time)
test/test_cli.py (1)

31-35: Consider prefixing unused mock argument.

The mock_stderr parameter is used only to suppress stderr output during the test. Prefixing with underscore signals intentional non-use.

     @patch.dict(os.environ, {}, clear=True)
     @patch('sys.stderr')
-    def test_get_api_key_not_found(self, mock_stderr):
+    def test_get_api_key_not_found(self, _mock_stderr):
         api_key = self.cli._get_api_key('openai')
         self.assertIsNone(api_key)
setup.py (1)

6-6: Minor typo in comment.

The comment has an extra # character.

-# # Try to read requirements from root, fallback to LLM directory
+# Try to read requirements from root, fallback to LLM directory
docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md (1)

190-194: Add language specifier to fenced code block.

The test output code block (lines 190-194) is missing a language specifier. Add text or plaintext for better markdown rendering.

-```
+```text
 Ran 143 tests in 10.136s
 
 OK (skipped=5)

</blockquote></details>
<details>
<summary>LLM/interpreter.py (1)</summary><blockquote>

`135-136`: **Redundant import of `requests` inside `_call_kimi`.**

The `requests` module is already imported and validated during `_initialize_client()` (line 53). This second import inside `_call_kimi` is unnecessary and the ImportError handling at line 152-153 will never trigger if initialization succeeded.


```diff
         try:
-            import requests
             response = requests.post(
                 f"{self._kimi_base_url.rstrip('/')}/v1/chat/completions",
                 headers=headers,
                 json=payload,
                 timeout=60,
             )

Note: This also means removing the ImportError catch at lines 152-153 since initialization already guarantees requests is available.

cortex/user_preferences.py (5)

201-205: Silent exception swallowing hides initialization errors.

The try-except-pass pattern silently ignores all exceptions during initial load. This can hide configuration corruption or permission issues.

Consider logging the exception or at least catching a more specific exception type:

         try:
             self.load()
-        except Exception:
-            # Ignore load errors here; callers can handle later
-            pass
+        except (IOError, ValueError) as e:
+            # Log but don't fail - callers can handle missing config
+            import logging
+            logging.debug(f"Initial config load skipped: {e}")

328-342: Unused key_parts parameter in _coerce_value_for_set.

The key_parts parameter is defined but never used in the method body. Either remove it or use it to provide context-aware coercion (e.g., knowing the target type based on the key path).

-    def _coerce_value_for_set(self, key_parts: List[str], value: Any) -> Any:
+    def _coerce_value_for_set(self, value: Any) -> Any:
         """Coerce string inputs into appropriate types for set()."""

Then update the call site at line 361:

-            coerced = self._coerce_value_for_set(parts, value)
+            coerced = self._coerce_value_for_set(value)

344-432: set() method has excessive cognitive complexity (56 vs. allowed 15).

This method handles many different preference types with nested conditionals, making it difficult to maintain and test. Consider refactoring into a dispatch pattern or strategy-based approach.

One approach is to use a handler registry:

def set(self, key: str, value: Any) -> bool:
    if self._preferences is None:
        self.load()
    
    parts = key.split(".")
    coerced = self._coerce_value_for_set(value)
    
    handlers = {
        "verbosity": self._set_verbosity,
        "confirmations": self._set_confirmations,
        "auto_update": self._set_auto_update,
        "ai": self._set_ai,
        "packages": self._set_packages,
        "conflicts": self._set_conflicts,
    }
    
    handler = handlers.get(parts[0])
    if handler:
        handler(parts, coerced, value)
    elif parts[0] in ["theme", "language", "timezone"]:
        setattr(self._preferences, parts[0], coerced)
    else:
        raise ValueError(f"Unknown preference key: {key}")
    
    self.save()
    return True

Then extract each branch into a dedicated _set_* method.


434-461: reset() always returns True, even on success paths.

The method signature promises a boolean return indicating success, but it always returns True or raises an exception. This is fine semantically (exceptions for errors, True for success), but SonarCloud flags it because there's no False return path.

If the intent is that errors always raise exceptions, consider changing the return type to None or documenting that exceptions are the error signaling mechanism.


131-166: from_dict has broad exception handling that masks errors.

Lines 141-142 and 151-152 catch bare Exception and silently default to fallback values. This can hide data corruption or invalid enum values without any indication to the caller.

Consider catching the specific ValueError that Enum() raises:

             try:
                 ai_data['creativity'] = AICreativity(ai_data['creativity'])
-            except Exception:
+            except ValueError:
                 ai_data['creativity'] = AICreativity.BALANCED

Similarly for verbosity at lines 149-152.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fe37193 and 9b9a36e.

📒 Files selected for processing (11)
  • LLM/interpreter.py (9 hunks)
  • LLM/requirements.txt (1 hunks)
  • LLM/test_interpreter.py (5 hunks)
  • cortex/cli.py (3 hunks)
  • cortex/user_preferences.py (5 hunks)
  • docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md (1 hunks)
  • requirements-dev.txt (1 hunks)
  • requirements.txt (1 hunks)
  • setup.py (3 hunks)
  • test/test_cli.py (3 hunks)
  • test/test_coordinator.py (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • requirements.txt
🚧 Files skipped from review as they are similar to previous changes (1)
  • LLM/requirements.txt
🧰 Additional context used
🧬 Code graph analysis (4)
test/test_cli.py (1)
cortex/cli.py (3)
  • _get_api_key (47-63)
  • _get_provider (31-45)
  • install (84-212)
LLM/test_interpreter.py (1)
LLM/interpreter.py (3)
  • APIProvider (9-15)
  • _call_kimi (118-155)
  • parse (227-246)
test/test_coordinator.py (1)
cortex/coordinator.py (5)
  • InstallationCoordinator (65-350)
  • InstallationStep (40-53)
  • InstallationResult (57-62)
  • StepStatus (31-36)
  • install_docker (353-407)
cortex/cli.py (1)
cortex/user_preferences.py (1)
  • get (288-315)
🪛 GitHub Actions: Cortex Automation
cortex/user_preferences.py

[error] 12-12: ModuleNotFoundError: No module named 'yaml'. PyYAML is not installed; importing 'yaml' fails during test run.

🪛 GitHub Check: SonarCloud Code Analysis
test/test_coordinator.py

[warning] 319-319: Consider using "assertGreater" instead.

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZqDPfPMNlR5npI5tRyC&open=AZqDPfPMNlR5npI5tRyC&pullRequest=192

cortex/user_preferences.py

[warning] 328-328: Remove the unused function parameter "key_parts".

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZrfnHzVPc78BrE-3Apy&open=AZrfnHzVPc78BrE-3Apy&pullRequest=192


[failure] 344-344: Refactor this function to reduce its Cognitive Complexity from 56 to the 15 allowed.

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZrfnHzVPc78BrE-3Apz&open=AZrfnHzVPc78BrE-3Apz&pullRequest=192


[failure] 434-434: Refactor this method to not always return the same value.

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZrfnHzVPc78BrE-3Ap0&open=AZrfnHzVPc78BrE-3Ap0&pullRequest=192

🪛 LanguageTool
docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md

[style] ~214-~214: This phrase is redundant (‘I’ stands for ‘interface’). Use simply “CLI”.
Context: ... addressed in PR #192) - Issue #11: CLI interface improvements (referenced in commits) - ...

(ACRONYM_TAUTOLOGY)

🪛 markdownlint-cli2 (0.18.1)
docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md

3-3: Bare URL used

(MD034, no-bare-urls)


4-4: Bare URL used

(MD034, no-bare-urls)


190-190: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


234-234: Bare URL used

(MD034, no-bare-urls)


235-235: Bare URL used

(MD034, no-bare-urls)


236-236: Bare URL used

(MD034, no-bare-urls)

🪛 Ruff (0.14.7)
test/test_cli.py

33-33: Unused method argument: mock_stderr

(ARG002)


80-80: Unused method argument: mock_history

(ARG002)

LLM/interpreter.py

55-55: Avoid specifying long messages outside the exception class

(TRY003)


147-147: Abstract raise to an inner function

(TRY301)


147-147: Avoid specifying long messages outside the exception class

(TRY003)


150-150: Abstract raise to an inner function

(TRY301)


150-150: Avoid specifying long messages outside the exception class

(TRY003)


153-153: Avoid specifying long messages outside the exception class

(TRY003)


155-155: Avoid specifying long messages outside the exception class

(TRY003)


155-155: Use explicit conversion flag

Replace with conversion flag

(RUF010)


165-165: Avoid specifying long messages outside the exception class

(TRY003)


167-167: Avoid specifying long messages outside the exception class

(TRY003)


230-230: Avoid specifying long messages outside the exception class

(TRY003)

cortex/user_preferences.py

141-141: Do not catch blind exception: Exception

(BLE001)


151-151: Do not catch blind exception: Exception

(BLE001)


203-205: try-except-pass detected, consider logging the exception

(S110)


203-203: Do not catch blind exception: Exception

(BLE001)


226-226: Consider moving this statement to an else block

(TRY300)


227-227: Do not catch blind exception: Exception

(BLE001)


228-228: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


228-228: Avoid specifying long messages outside the exception class

(TRY003)


228-228: Use explicit conversion flag

Replace with conversion flag

(RUF010)


250-250: Consider moving this statement to an else block

(TRY300)


253-253: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


253-253: Avoid specifying long messages outside the exception class

(TRY003)


253-253: Use explicit conversion flag

Replace with conversion flag

(RUF010)


254-254: Do not catch blind exception: Exception

(BLE001)


255-255: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


255-255: Avoid specifying long messages outside the exception class

(TRY003)


255-255: Use explicit conversion flag

Replace with conversion flag

(RUF010)


283-283: Consider moving this statement to an else block

(TRY300)


285-285: Do not catch blind exception: Exception

(BLE001)


286-286: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


286-286: Avoid specifying long messages outside the exception class

(TRY003)


286-286: Use explicit conversion flag

Replace with conversion flag

(RUF010)


328-328: Unused method argument: key_parts

(ARG002)


370-370: Do not catch blind exception: Exception

(BLE001)


371-371: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


371-371: Avoid specifying long messages outside the exception class

(TRY003)


375-375: Abstract raise to an inner function

(TRY301)


375-375: Avoid specifying long messages outside the exception class

(TRY003)


377-377: Abstract raise to an inner function

(TRY301)


377-377: Avoid specifying long messages outside the exception class

(TRY003)


382-382: Abstract raise to an inner function

(TRY301)


382-382: Avoid specifying long messages outside the exception class

(TRY003)


385-385: Abstract raise to an inner function

(TRY301)


385-385: Avoid specifying long messages outside the exception class

(TRY003)


387-387: Abstract raise to an inner function

(TRY301)


387-387: Avoid specifying long messages outside the exception class

(TRY003)


392-392: Abstract raise to an inner function

(TRY301)


392-392: Avoid specifying long messages outside the exception class

(TRY003)


399-399: Do not catch blind exception: Exception

(BLE001)


400-400: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


400-400: Avoid specifying long messages outside the exception class

(TRY003)


403-403: Abstract raise to an inner function

(TRY301)


403-403: Avoid specifying long messages outside the exception class

(TRY003)


410-410: Abstract raise to an inner function

(TRY301)


410-410: Avoid specifying long messages outside the exception class

(TRY003)


412-412: Abstract raise to an inner function

(TRY301)


412-412: Avoid specifying long messages outside the exception class

(TRY003)


417-417: Abstract raise to an inner function

(TRY301)


417-417: Avoid specifying long messages outside the exception class

(TRY003)


419-419: Abstract raise to an inner function

(TRY301)


419-419: Avoid specifying long messages outside the exception class

(TRY003)


426-426: Abstract raise to an inner function

(TRY301)


426-426: Avoid specifying long messages outside the exception class

(TRY003)


430-430: Consider moving this statement to an else block

(TRY300)


432-432: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


432-432: Avoid specifying long messages outside the exception class

(TRY003)


432-432: Use explicit conversion flag

Replace with conversion flag

(RUF010)


457-457: Avoid specifying long messages outside the exception class

(TRY003)


539-539: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (18)
LLM/test_interpreter.py (4)

15-24: LGTM!

The setUp correctly patches sys.modules to stub external dependencies and ensures cleanup via addCleanup.


49-54: LGTM!

The Kimi provider initialization test correctly verifies the default model kimi-k2-turbo-preview is set when no KIMI_DEFAULT_MODEL environment variable is present.


56-72: LGTM!

The test properly verifies successful Kimi API calls, including the Authorization header format and command parsing.


74-84: LGTM!

The failure test correctly verifies that HTTP errors are caught and re-raised as RuntimeError.

test/test_coordinator.py (1)

1-16: LGTM on import updates.

The added imports for Mock, patch, call, InstallationResult, and install_docker are appropriate for the test file's needs.

test/test_cli.py (3)

26-29: LGTM!

Good addition of the KIMI provider API key test, maintaining consistency with the other provider tests.


47-55: LGTM!

The KIMI provider detection and provider override tests correctly verify the _get_provider() logic for KIMI and the CORTEX_PROVIDER override behavior.


77-88: LGTM!

This test correctly verifies that the fake provider path bypasses API key validation and proceeds with command generation.

Note: The mock_history parameter (line 80) is unused but necessary to prevent InstallationHistory from performing file I/O. Consider prefixing with underscore (_mock_history) to signal intentional non-use.

setup.py (3)

15-15: LGTM on fallback requirements.

The fallback requirements list appropriately includes requests>=2.32.4 to support the new Kimi provider HTTP integration when no requirements file exists.


35-42: Verify Python 3.8 and 3.9 compatibility.

The minimum Python version was lowered from 3.10 to 3.8. Ensure the codebase doesn't use Python 3.10+ features (e.g., match statements, union types with | syntax).


33-33: Verify license change is intentional.

The license classifier was changed from Apache Software License to MIT License. Ensure this change is intentional and that the LICENSE file in the repository reflects this update.

requirements-dev.txt (1)

10-10: Version requests>=2.32.4 is valid.

The requests>=2.32.4 dependency is a released PyPI package (includes security fix for CVE-2024-47081). However, verify that this dependency is not already included in the root requirements.txt referenced at line 4 to avoid duplication.

cortex/cli.py (4)

47-63: LGTM! API key retrieval is clean and well-structured.

The provider-to-env-var mapping is correct and the error messaging guides users appropriately. Returning None for unknown providers (line 57) is acceptable since those would be handled elsewhere or caught in validation.


87-92: Fake provider handling is correct.

The special case for fake provider using CORTEX_FAKE_API_KEY or the default 'fake-api-key' is appropriate for testing scenarios. The early return on line 92 when the API key is missing prevents further execution with invalid credentials.


536-537: Documentation updates are helpful for discoverability.

Adding KIMI_API_KEY and CORTEX_PROVIDER to the CLI help text ensures users can discover these options via --help.


31-45: Provider detection logic looks good, but consider validating the CORTEX_PROVIDER override.

The logic correctly prioritizes CORTEX_PROVIDER override, then falls back to detecting keys in order: OpenAI → Claude → Kimi → Fake. The fallback to 'openai' at line 45 is reasonable as a default when no keys or override are found.

If CORTEX_PROVIDER is set to an invalid value (e.g., "groq"), it will be passed through and fail later in CommandInterpreter. Consider validating the override value against supported providers here to provide earlier, clearer feedback to users.

LLM/interpreter.py (2)

51-61: Kimi initialization correctly validates the requests dependency.

The try/except for the requests import provides a clear error message if the package is missing. Setting self.client = None for the FAKE provider is appropriate.


157-187: Fake provider implementation is well-designed for testing.

The implementation:

  • Supports environment-driven commands via CORTEX_FAKE_COMMANDS
  • Validates JSON structure properly
  • Provides sensible keyword-based defaults for common test cases
  • Has a fallback for unknown inputs

This enables deterministic testing without API costs.

Copilot AI review requested due to automatic review settings December 2, 2025 15:18
@Sahilbhatane
Copy link
Collaborator Author

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md (1)

25-26: Kimi base URL documented inconsistently across sections

Earlier sections state the default KIMI_API_BASE_URL as https://api.moonshot.ai (and the advanced config example uses that), but the “API Request Format” section uses https://api.moonshot.cn/v1/chat/completions.

Please pick the correct default (ideally whatever CommandInterpreter actually uses by default) and make all occurrences consistent, or explicitly call out that one is the default and the other is only an example.

Also applies to: 48-49, 106-111, 129-145

🧹 Nitpick comments (7)
LLM/test_interpreter.py (3)

49-55: Avoid clearing the entire environment in Kimi init test

@patch.dict(os.environ, {}, clear=True) guarantees a clean slate for Kimi-related env vars, but it also temporarily removes all environment variables for this test, which can cause surprising behavior if CommandInterpreter or any imported code ever depends on unrelated env settings (e.g., HOME, PATH). It’s safer to patch only the keys you care about.

Consider:

-    @patch.dict(os.environ, {}, clear=True)
+    @patch.dict(os.environ, {
+        "KIMI_API_KEY": "",
+        "KIMI_API_BASE_URL": "",
+        "KIMI_DEFAULT_MODEL": "",
+    }, clear=False)

This keeps non-Kimi env intact while still testing default Kimi behavior.


274-341: Integration tests are well-gated; custom-model env handling now safe

The TestKimiK2Integration class:

  • Uses @unittest.skipUnless(RUN_KIMI_INTEGRATION_TESTS == "1") to keep real Kimi calls out of normal CI.
  • Validates basic, complex, and validated-command flows against the live API.
  • Uses @patch.dict(os.environ, {'KIMI_DEFAULT_MODEL': 'kimi-k2-0905-preview'}) so the custom-model test no longer leaks env state after failures (addressing the earlier cleanup concern).

Two minor suggestions:

  1. Since these are external-network integration tests, consider moving them to a dedicated integration module/package to keep this unit-test file focused on pure unit tests.
  2. The doc file currently describes the default Kimi model as kimi-k2, while these tests (and likely the code) use kimi-k2-turbo-preview as the default/custom choice; please align the documentation with the actual default behavior.

Both are non-blocking; current structure is functionally correct.


56-73: Kimi success test is solid; asserting request URL and payload structure is a valuable strengthening—Kimi API is stable and OpenAI-compatible

The success path correctly verifies:

  • Parsed commands from _call_kimi
  • That requests.post is called
  • Presence and value of the Authorization header

The suggested assertion for the HTTP contract is well-founded. Kimi's API uses a documented, stable OpenAI-compatible endpoint at /v1/chat/completions with standard payload structure (model field, messages array). Adding assertions for the URL path and key JSON fields will reliably catch regressions in _call_kimi's request shape without becoming brittle:

call_args, call_kwargs = mock_post.call_args
self.assertIn("/v1/chat/completions", call_args[0])
self.assertEqual(call_kwargs["json"]["model"], interpreter.model)
self.assertIn("messages", call_kwargs["json"])

This strengthens test coverage by locking in the Kimi API contract, which is officially documented and widely implemented by third-party integrators.

docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md (4)

189-194: Add language hint to fenced code in Testing Results

The test output block uses a bare fenced code block. To satisfy markdownlint (MD040) and improve rendering, you can add a language, e.g.:

-```
+```text
 Ran 143 tests in 10.136s

 OK (skipped=5)

3-6: Markdownlint nits: bare URLs and minor wording

A few small polish items from static analysis:

  • Replace “CLI interface improvements” with just “CLI improvements” to avoid tautology.
  • Wrap bare URLs in Markdown links or <...> to satisfy MD034 (e.g., [Moonshot AI Platform](https://platform.moonshot.ai/)).

These are purely cosmetic and can be deferred.

Also applies to: 83-87, 190-196, 234-236


5-7: Status/Date fields may need updating when PR merges

The header states Status: ✅ Implemented and a specific “Date Completed”, but this PR is still open. To avoid confusion for future readers, consider phrasing this as “Planned/Implemented in PR #192 (pending merge)” or updating the status/date when the PR actually lands.


50-55: Consider updating requests version constraint and documenting versioning approach

The CVE information for requests>=2.32.4 is accurate (all three CVEs are properly fixed at that version). However, the latest requests release is 2.32.5 (August 2025), making the constraint outdated within months. To make this section age more gracefully, consider:

  • Updating to requests>=2.32.5 for the latest patch, or
  • Noting "(minimum secure version as of [date])" and/or linking to upstream security advisories rather than hard-coding CVE IDs inline.

This ensures the documentation remains correct without requiring constant updates as newer patch versions are released.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b9a36e and e0faa7b.

📒 Files selected for processing (2)
  • LLM/test_interpreter.py (5 hunks)
  • docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
LLM/test_interpreter.py (1)
LLM/interpreter.py (3)
  • APIProvider (9-15)
  • _call_kimi (118-155)
  • parse (227-246)
🪛 LanguageTool
docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md

[style] ~214-~214: This phrase is redundant (‘I’ stands for ‘interface’). Use simply “CLI”.
Context: ... addressed in PR #192) - Issue #11: CLI interface improvements (referenced in commits) - ...

(ACRONYM_TAUTOLOGY)

🪛 markdownlint-cli2 (0.18.1)
docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md

3-3: Bare URL used

(MD034, no-bare-urls)


4-4: Bare URL used

(MD034, no-bare-urls)


190-190: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


234-234: Bare URL used

(MD034, no-bare-urls)


235-235: Bare URL used

(MD034, no-bare-urls)


236-236: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Agent
🔇 Additional comments (3)
LLM/test_interpreter.py (3)

3-6: Sys-module stubbing for OpenAI/Anthropic looks correct and contained

Using SimpleNamespace plus patch.dict(sys.modules, ...) in setUp cleanly isolates tests from real openai/anthropic installs and avoids ImportError, and addCleanup(self.sys_modules_patcher.stop) ensures restoration after each test. This pattern is sound for these unit tests.

Also applies to: 15-24


26-31: OpenAI default model assertion aligns with intended behavior

Changing the expected default model to "gpt-4o" in test_initialization_openai matches the stated best-practice default elsewhere in the PR and keeps this test tightly coupled to the current interpreter default. No issues here.


74-85: Kimi failure path test matches _call_kimi error behavior

Simulating a 401 via raise_for_status.side_effect and asserting a RuntimeError from _call_kimi correctly exercises the generic exception-wrapping logic. This gives good coverage of the negative path without overfitting on the error message.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR successfully integrates Moonshot AI's Kimi K2 as a new LLM provider for Cortex, expanding multi-LLM capabilities alongside OpenAI and Anthropic. It also introduces a FAKE provider for cost-free testing and adds comprehensive Docker-based integration tests.

Key Changes

  • Provider Integration: Added native Kimi K2 support with HTTP-based API calls, configurable via KIMI_API_KEY environment variable
  • Testing Infrastructure: Introduced FAKE provider for offline testing and Docker-based integration test suite with helper utilities
  • Documentation: Added comprehensive testing strategy guide, contribution guidelines, and Kimi K2 implementation reference documentation

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
test/test_coordinator.py Removed obsolete test methods and added unused imports that should be cleaned up
test/test_cli.py Added Kimi and FAKE provider test coverage for API key and provider detection
test/integration/test_end_to_end.py New Docker-based integration tests for end-to-end CLI workflows
test/integration/docker_utils.py Docker container execution utilities for integration testing
test/integration/__init__.py Integration test package initialization
LLM/test_interpreter.py Added Kimi K2 integration tests with real API call validation (conditionally skipped)
LLM/interpreter.py Core Kimi K2 and FAKE provider implementation with API call methods
cortex/cli.py Enhanced provider detection with override support and provider-aware API key handling
cortex/user_preferences.py Major refactoring with improved validation and reduced language support list
setup.py Added requests dependency, updated Python version support (3.8+), corrected license
requirements.txt Added requests and PyYAML dependencies
requirements-dev.txt Added requests to dev dependencies
LLM/requirements.txt Added requests for Kimi HTTP client
test.md Comprehensive testing strategy documentation with Docker requirements
contribution.md New contributor guidelines with coding standards and workflow
docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md Detailed implementation reference with API examples and configuration
Comments suppressed due to low confidence (2)

test/test_coordinator.py:2

  • Import of 'call' is not used.
from unittest.mock import Mock, patch, call

test/test_coordinator.py:5

  • Import of 'time' is not used.
import time

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md (1)

21-22: Default Kimi model in documentation doesn't match code.

The documentation states kimi-k2 as the default model (lines 21, 48, 138), but the code in LLM/interpreter.py uses kimi-k2-turbo-preview as the actual default (line 268). This inconsistency may confuse users.

Update the documentation to consistently use kimi-k2-turbo-preview or change the code default if kimi-k2 is the intended value.

Also applies to: 48-48, 138-138

🧹 Nitpick comments (9)
docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md (1)

190-194: Add language specifier to fenced code block.

The test results code block is missing a language specifier which aids syntax highlighting and accessibility.

-```
+```text
 Ran 143 tests in 10.136s

 OK (skipped=5)

</blockquote></details>
<details>
<summary>cortex/user_preferences.py (3)</summary><blockquote>

`328-342`: **Remove unused `key_parts` parameter.**

The `key_parts` parameter is never used in `_coerce_value_for_set()`. Either remove it or implement the intended type coercion logic based on the key path.


```diff
-    def _coerce_value_for_set(self, key_parts: List[str], value: Any) -> Any:
+    def _coerce_value_for_set(self, value: Any) -> Any:
         """Coerce string inputs into appropriate types for set()."""

And update the call site at line 361:

-            coerced = self._coerce_value_for_set(parts, value)
+            coerced = self._coerce_value_for_set(value)

202-206: Silent exception swallowing may hide configuration issues.

The try-except-pass block silently ignores all load errors during initialization. This could mask configuration file corruption or permission issues that users should be aware of.

Consider logging the exception or storing the error state:

         try:
             self.load()
         except Exception:
-            # Ignore load errors here; callers can handle later
-            pass
+            # Store the error; callers can check or load() will re-raise
+            self._preferences = UserPreferences()

Alternatively, if errors truly should be ignored, at least log them at debug level for troubleshooting.


252-256: Chain exceptions to preserve traceback context.

When re-raising exceptions, use raise ... from e to preserve the original traceback for easier debugging.

         except yaml.YAMLError as e:
-            raise ValueError(f"Invalid YAML in config file: {str(e)}")
+            raise ValueError(f"Invalid YAML in config file: {e!s}") from e
         except Exception as e:
-            raise IOError(f"Failed to load config file: {str(e)}")
+            raise IOError(f"Failed to load config file: {e!s}") from e

The same pattern applies to other exception handlers in save() (line 286), _create_backup() (line 228), and set() (line 432).

test/integration/test_end_to_end.py (1)

107-110: Potential fragile assertion on last output line.

The assertion self.assertIn("OK", result.stdout.splitlines()[-1]) will raise an IndexError if stdout is empty. Consider adding a guard or using a more robust check.

         result = self._run("python test/run_all_tests.py", env=env)

         self.assertTrue(result.succeeded(), msg=result.stderr)
-        self.assertIn("OK", result.stdout.splitlines()[-1])
+        lines = result.stdout.splitlines()
+        self.assertTrue(lines, "Expected test output but stdout was empty")
+        self.assertIn("OK", lines[-1])
LLM/test_interpreter.py (1)

56-72: Mock response missing raise_for_status stub.

The mock response in test_call_kimi_success doesn't stub raise_for_status(). While this works because Mock() returns another mock by default, explicitly stubbing it ensures the test intent is clear and protects against future changes.

         mock_response = Mock()
         mock_response.status_code = 200
+        mock_response.raise_for_status = Mock()  # Explicitly stub to avoid accidental side effects
         mock_response.json.return_value = {
             "choices": [{"message": {"content": '{"commands": ["apt update", "apt install curl"]}'}}]
         }
LLM/interpreter.py (1)

136-137: Redundant requests import inside _call_kimi.

The requests module is already imported (and validated) during _initialize_client() for the KIMI provider (lines 52-58). The import at line 136 is redundant and adds unnecessary overhead on each API call.

         try:
-            import requests
             response = requests.post(

Since self.client is set to the requests module in _initialize_client(), you can use self.client.post(...) instead for consistency:

-            import requests
-            response = requests.post(
+            response = self.client.post(
test/integration/docker_utils.py (2)

24-50: docker_available() is robust; subprocess security lint is low risk here

The availability check via docker --version and docker info with small timeouts and shell=False is solid for a test helper. Ruff’s S603 warnings here are low risk since arguments are constant and not user-controlled; if those warnings are noisy in CI, you can either # noqa: S603 these calls or disable that rule for test modules.


53-106: run_in_docker helper looks good; minor f-string / lint tweak and trust boundary note

The argv-based docker invocation plus bash -lc inside the container is appropriate for an integration-test harness and avoids host-side shell injection; just ensure callers only pass trusted command strings since they execute with full shell semantics in the container. To address the RUF010 hint and avoid str() inside an f-string, you can simplify the volume mount construction:

-    for host_path, container_path in mounts or []:
-        docker_cmd.extend([
-            "-v",
-            f"{str(host_path.resolve())}:{container_path}",
-        ])
+    for host_path, container_path in mounts or []:
+        docker_cmd.extend(
+            [
+                "-v",
+                f"{host_path.resolve()!s}:{container_path}",
+            ]
+        )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e0faa7b and 9746f92.

📒 Files selected for processing (7)
  • LLM/interpreter.py (9 hunks)
  • LLM/test_interpreter.py (5 hunks)
  • cortex/user_preferences.py (5 hunks)
  • docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md (1 hunks)
  • test/integration/docker_utils.py (1 hunks)
  • test/integration/test_end_to_end.py (1 hunks)
  • test/test_coordinator.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/test_coordinator.py
🧰 Additional context used
🧬 Code graph analysis (1)
LLM/test_interpreter.py (1)
LLM/interpreter.py (3)
  • APIProvider (9-15)
  • _call_kimi (118-153)
  • parse (225-244)
🪛 LanguageTool
docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md

[style] ~214-~214: This phrase is redundant (‘I’ stands for ‘interface’). Use simply “CLI”.
Context: ... addressed in PR #192) - Issue #11: CLI interface improvements (referenced in commits) - ...

(ACRONYM_TAUTOLOGY)

🪛 markdownlint-cli2 (0.18.1)
docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md

3-3: Bare URL used

(MD034, no-bare-urls)


4-4: Bare URL used

(MD034, no-bare-urls)


190-190: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


234-234: Bare URL used

(MD034, no-bare-urls)


235-235: Bare URL used

(MD034, no-bare-urls)


236-236: Bare URL used

(MD034, no-bare-urls)

🪛 Ruff (0.14.7)
LLM/interpreter.py

55-55: Avoid specifying long messages outside the exception class

(TRY003)


147-147: Abstract raise to an inner function

(TRY301)


147-147: Avoid specifying long messages outside the exception class

(TRY003)


150-150: Abstract raise to an inner function

(TRY301)


150-150: Avoid specifying long messages outside the exception class

(TRY003)


153-153: Avoid specifying long messages outside the exception class

(TRY003)


153-153: Use explicit conversion flag

Replace with conversion flag

(RUF010)


163-163: Avoid specifying long messages outside the exception class

(TRY003)


165-165: Avoid specifying long messages outside the exception class

(TRY003)


228-228: Avoid specifying long messages outside the exception class

(TRY003)

test/integration/docker_utils.py

32-32: subprocess call: check for execution of untrusted input

(S603)


40-40: subprocess call: check for execution of untrusted input

(S603)


48-48: Consider moving this statement to an else block

(TRY300)


89-89: Use explicit conversion flag

Replace with conversion flag

(RUF010)


97-97: subprocess call: check for execution of untrusted input

(S603)

cortex/user_preferences.py

142-142: Do not catch blind exception: Exception

(BLE001)


152-152: Do not catch blind exception: Exception

(BLE001)


204-206: try-except-pass detected, consider logging the exception

(S110)


204-204: Do not catch blind exception: Exception

(BLE001)


226-226: Consider moving this statement to an else block

(TRY300)


227-227: Do not catch blind exception: Exception

(BLE001)


228-228: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


228-228: Avoid specifying long messages outside the exception class

(TRY003)


228-228: Use explicit conversion flag

Replace with conversion flag

(RUF010)


250-250: Consider moving this statement to an else block

(TRY300)


253-253: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


253-253: Avoid specifying long messages outside the exception class

(TRY003)


253-253: Use explicit conversion flag

Replace with conversion flag

(RUF010)


254-254: Do not catch blind exception: Exception

(BLE001)


255-255: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


255-255: Avoid specifying long messages outside the exception class

(TRY003)


255-255: Use explicit conversion flag

Replace with conversion flag

(RUF010)


283-283: Consider moving this statement to an else block

(TRY300)


285-285: Do not catch blind exception: Exception

(BLE001)


286-286: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


286-286: Avoid specifying long messages outside the exception class

(TRY003)


286-286: Use explicit conversion flag

Replace with conversion flag

(RUF010)


328-328: Unused method argument: key_parts

(ARG002)


370-370: Do not catch blind exception: Exception

(BLE001)


371-371: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


371-371: Avoid specifying long messages outside the exception class

(TRY003)


375-375: Abstract raise to an inner function

(TRY301)


375-375: Avoid specifying long messages outside the exception class

(TRY003)


377-377: Abstract raise to an inner function

(TRY301)


377-377: Avoid specifying long messages outside the exception class

(TRY003)


382-382: Abstract raise to an inner function

(TRY301)


382-382: Avoid specifying long messages outside the exception class

(TRY003)


385-385: Abstract raise to an inner function

(TRY301)


385-385: Avoid specifying long messages outside the exception class

(TRY003)


387-387: Abstract raise to an inner function

(TRY301)


387-387: Avoid specifying long messages outside the exception class

(TRY003)


392-392: Abstract raise to an inner function

(TRY301)


392-392: Avoid specifying long messages outside the exception class

(TRY003)


399-399: Do not catch blind exception: Exception

(BLE001)


400-400: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


400-400: Avoid specifying long messages outside the exception class

(TRY003)


403-403: Abstract raise to an inner function

(TRY301)


403-403: Avoid specifying long messages outside the exception class

(TRY003)


410-410: Abstract raise to an inner function

(TRY301)


410-410: Avoid specifying long messages outside the exception class

(TRY003)


412-412: Abstract raise to an inner function

(TRY301)


412-412: Avoid specifying long messages outside the exception class

(TRY003)


417-417: Abstract raise to an inner function

(TRY301)


417-417: Avoid specifying long messages outside the exception class

(TRY003)


419-419: Abstract raise to an inner function

(TRY301)


419-419: Avoid specifying long messages outside the exception class

(TRY003)


426-426: Abstract raise to an inner function

(TRY301)


426-426: Avoid specifying long messages outside the exception class

(TRY003)


430-430: Consider moving this statement to an else block

(TRY300)


432-432: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


432-432: Avoid specifying long messages outside the exception class

(TRY003)


432-432: Use explicit conversion flag

Replace with conversion flag

(RUF010)


457-457: Avoid specifying long messages outside the exception class

(TRY003)


539-539: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (5)
test/integration/test_end_to_end.py (1)

27-37: Add return type annotation to _run method.

The method has parameter type hints but is missing the return type annotation.

-    def _run(self, command: str, env: dict | None = None) -> DockerRunResult:
+    def _run(self, command: str, env: dict | None = None) -> DockerRunResult:

The signature already includes -> DockerRunResult so this appears to be addressed.

LLM/test_interpreter.py (1)

274-285: LGTM - Integration test class is well-guarded.

The skip decorator with RUN_KIMI_INTEGRATION_TESTS environment variable check and the setUp method's skipTest for missing API key provide appropriate guards for real API tests. The docstring correctly shows the unittest command.

LLM/interpreter.py (2)

155-166: LGTM - _call_fake correctly returns commands directly.

The implementation properly validates that commands is a list and returns it directly, avoiding the redundant JSON parsing that was previously flagged.


260-269: LGTM - Default model resolution is well-implemented.

The _default_model method correctly provides current model defaults for each provider, with the Kimi provider supporting environment variable override via KIMI_DEFAULT_MODEL. The OpenAI default has been updated to gpt-4o addressing the previous deprecation concern.

test/integration/docker_utils.py (1)

11-21: DockerRunResult dataclass is a clean abstraction

The dataclass and succeeded() helper neatly wrap subprocess results; no changes needed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
LLM/interpreter.py (1)

30-34: Use _default_model() in __init__ to fix incorrect defaults for KIMI/FAKE

Right now, __init__ hard-codes default models:

if model:
    self.model = model
else:
    self.model = "gpt-4" if self.provider == APIProvider.OPENAI else "claude-sonnet-4-20250514"

You’ve introduced _default_model() with provider-specific defaults (including KIMI and FAKE), but __init__ doesn’t call it. As a result:

  • KIMI and FAKE instances created without an explicit model will incorrectly get "claude-sonnet-4-20250514" (the else branch), never using the KIMI/FAKE defaults defined in _default_model().
  • There are now two divergent sources of truth for default models, which is brittle.

This is a functional bug for KIMI/FAKE defaults and a maintainability issue overall. Suggest centralizing on _default_model():

     def __init__(
         self,
         api_key: str,
         provider: str = "openai",
         model: Optional[str] = None
     ):
         self.api_key = api_key
         self.provider = APIProvider(provider.lower())
-        
-        if model:
-            self.model = model
-        else:
-            self.model = "gpt-4" if self.provider == APIProvider.OPENAI else "claude-sonnet-4-20250514"
-        
+
+        # Use provider-specific defaults, overridden by explicit `model` if provided.
+        self.model = model or self._default_model()
+
         self._initialize_client()

With this, KIMI and FAKE will correctly pick up "kimi-k2-turbo-preview" (or KIMI_DEFAULT_MODEL) and "fake-local-model" when no explicit model is passed, and there’s a single place to adjust defaults going forward.

Also applies to: 260-269

🧹 Nitpick comments (1)
LLM/interpreter.py (1)

37-62: Unify KIMI client setup and HTTP call; avoid re-importing requests

For the KIMI path, _initialize_client() imports requests, assigns it to self.client, and sets _kimi_base_url, but _call_kimi() re-imports requests and doesn’t use self.client. This duplication slightly complicates testability and makes the client setup pattern inconsistent with the other providers.

You can simplify and make this more consistent by relying on self.client in _call_kimi() and dropping the second import:

     def _call_kimi(self, user_input: str) -> List[str]:
         """Call the Kimi K2 HTTP API and parse the response body."""
@@
-        try:
-            import requests
-            response = requests.post(
+        try:
+            response = self.client.post(
                 f"{self._kimi_base_url.rstrip('/')}/v1/chat/completions",
                 headers=headers,
                 json=payload,
                 timeout=60,
             )

(Optionally, you could make self.client a requests.Session() in _initialize_client() if you want connection pooling, but that’s not required for correctness.)

Also, please double-check that the endpoint path (/v1/chat/completions) and response shape (choices[0].message.content) still match the current Kimi K2 API.

Also applies to: 118-154

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9746f92 and f8038b9.

📒 Files selected for processing (1)
  • LLM/interpreter.py (9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
LLM/interpreter.py (1)
cortex/user_preferences.py (2)
  • get (288-315)
  • validate (463-500)
🪛 Ruff (0.14.7)
LLM/interpreter.py

55-55: Avoid specifying long messages outside the exception class

(TRY003)


147-147: Abstract raise to an inner function

(TRY301)


147-147: Avoid specifying long messages outside the exception class

(TRY003)


150-150: Abstract raise to an inner function

(TRY301)


150-150: Avoid specifying long messages outside the exception class

(TRY003)


153-153: Avoid specifying long messages outside the exception class

(TRY003)


153-153: Use explicit conversion flag

Replace with conversion flag

(RUF010)


163-163: Avoid specifying long messages outside the exception class

(TRY003)


165-165: Avoid specifying long messages outside the exception class

(TRY003)


228-228: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (2)
LLM/interpreter.py (2)

10-15: Enum additions for KIMI/FAKE look consistent with the rest of the interpreter

The new APIProvider.KIMI and APIProvider.FAKE members align with the branches in _initialize_client and parse(), and the string values match the provider argument usage. No changes needed here.


64-80: Docstrings and high-level flow for parsing/validation are clear

The added/updated docstrings around _get_system_prompt, provider-specific call methods, _parse_commands, _validate_commands, parse, and parse_with_context accurately describe the behavior and make the NL→commands pipeline much easier to follow. No issues here.

Also applies to: 82-98, 101-116, 187-203, 205-223, 225-244, 246-258

@dhvll
Copy link
Collaborator

dhvll commented Dec 5, 2025

I saw both of your video. There is not clear showcasing of kimi usage. Have you even use api key to test if your code is working. Create a separate video just showcase kimi usage or how you tested on your system. Testing as in not running test files fyi, proper usage.

@Sahilbhatane
Copy link
Collaborator Author

Sahilbhatane commented Dec 5, 2025

I can pull up echo command to show API is in environment, or I can add show short function in CLI to show which LLM is getting used for command execution.
Which method would you like?
2nd one can be converted into a proper cortex command like "cortex check-LLM",
if 1st then I will only add screenshot making video for it is not necessary, for 2nd I can make video showing function and the execution. I will also show API rests in environment through echo command in video 👍🏻

@dhvll
Copy link
Collaborator

dhvll commented Dec 5, 2025

Echo is more like print statement.
Show your implementation using postman.
I am not asking if api is in environment or where your code is I can read codebase. I am asking how you know your kimi implementation is working.
Add recorded video under this comment, only kimi api usage don't add extras unnecessary stuff.

@Sahilbhatane
Copy link
Collaborator Author

@mikejmorgan-ai
Copy link
Member

@dhvll Could you review this PR? Would appreciate your testing perspective on the Kimi K2 provider. Thanks!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
LLM/interpreter.py (1)

31-40: Missing default model assignment for KIMI and FAKE providers.

When model is not explicitly provided and provider is "kimi" or "fake", self.model is never assigned. This causes an AttributeError in _call_kimi (line 136) when building the payload.

Apply this diff to add default models for the new providers:

         if model:
             self.model = model
         else:
             if self.provider == APIProvider.OPENAI:
                 self.model = "gpt-4"
             elif self.provider == APIProvider.CLAUDE:
                 self.model = "claude-sonnet-4-20250514"
+            elif self.provider == APIProvider.KIMI:
+                self.model = os.environ.get("KIMI_DEFAULT_MODEL", "kimi-k2-turbo-preview")
+            elif self.provider == APIProvider.FAKE:
+                self.model = "fake-local-model"
             elif self.provider == APIProvider.OLLAMA:
                 self.model = "llama3.2"  # Default Ollama model

Alternatively, consolidate this logic by calling self._default_model() (defined at line 306) instead of duplicating defaults.

♻️ Duplicate comments (1)
LLM/interpreter.py (1)

169-176: Missing key existence check before accessing data["commands"].

If the JSON in CORTEX_FAKE_COMMANDS is valid but lacks the "commands" key, line 174 raises a bare KeyError instead of a descriptive ValueError.

This was flagged in a previous review. Add an explicit key check:

             try:
                 data = json.loads(payload)
             except json.JSONDecodeError as exc:
                 raise ValueError("CORTEX_FAKE_COMMANDS must contain valid JSON") from exc
+            if "commands" not in data:
+                raise ValueError("CORTEX_FAKE_COMMANDS must contain a 'commands' key")
             if not isinstance(data["commands"], list):  
                 raise ValueError("'commands' must be a list in CORTEX_FAKE_COMMANDS")  
             return data["commands"]
🧹 Nitpick comments (2)
LLM/interpreter.py (2)

145-152: Redundant requests import; use the stored client instead.

requests is already imported and stored in self.client during _initialize_client() (line 63). The re-import here is unnecessary.

         try:
-            import requests
-            response = requests.post(
+            response = self.client.post(
                 f"{self._kimi_base_url.rstrip('/')}/v1/chat/completions",
                 headers=headers,
                 json=payload,
                 timeout=60,
             )

306-315: _default_model() is defined but never called.

This method duplicates the default model logic already present in __init__ (lines 34-39) but is never invoked. This creates maintenance burden and potential drift between the two locations.

Either:

  1. Remove _default_model() if it was added speculatively, or
  2. Refactor __init__ to use it:
         if model:
             self.model = model
         else:
-            if self.provider == APIProvider.OPENAI:
-                self.model = "gpt-4"
-            elif self.provider == APIProvider.CLAUDE:
-                self.model = "claude-sonnet-4-20250514"
-            elif self.provider == APIProvider.OLLAMA:
-                self.model = "llama3.2"  # Default Ollama model
+            self.model = self._default_model()

Note: This requires moving the method definition before __init__ or deferring the call until after full initialization.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0b32082 and 791427e.

📒 Files selected for processing (3)
  • LLM/interpreter.py (10 hunks)
  • cortex/cli.py (3 hunks)
  • requirements.txt (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • requirements.txt
🧰 Additional context used
🧬 Code graph analysis (1)
cortex/cli.py (1)
cortex/branding.py (1)
  • cx_print (55-75)
🪛 Ruff (0.14.7)
LLM/interpreter.py

61-61: Avoid specifying long messages outside the exception class

(TRY003)


157-157: Abstract raise to an inner function

(TRY301)


157-157: Avoid specifying long messages outside the exception class

(TRY003)


160-160: Abstract raise to an inner function

(TRY301)


160-160: Avoid specifying long messages outside the exception class

(TRY003)


163-163: Avoid specifying long messages outside the exception class

(TRY003)


163-163: Use explicit conversion flag

Replace with conversion flag

(RUF010)


173-173: Avoid specifying long messages outside the exception class

(TRY003)


175-175: Avoid specifying long messages outside the exception class

(TRY003)


272-272: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (7)
LLM/interpreter.py (3)

57-64: LGTM for KIMI client initialization.

The import check and base URL configuration are properly handled. The fallback to https://api.moonshot.ai is reasonable.


178-195: LGTM for fake provider defaults.

The safe defaults use only echo commands, making them safe for dry-run and integration tests. The keyword matching and fallback logic are reasonable.


278-281: LGTM for new provider routing.

The KIMI and FAKE providers are correctly routed to their respective handler methods, consistent with the existing pattern.

cortex/cli.py (4)

68-71: Auto-detection of fake provider via CORTEX_FAKE_COMMANDS presence.

This implicit detection allows tests to set CORTEX_FAKE_COMMANDS without needing CORTEX_PROVIDER=fake. While convenient for testing, it could surprise users who set the variable for other reasons (e.g., debugging) without intending to switch providers.

Consider documenting this behavior or requiring explicit CORTEX_PROVIDER=fake for production safety.


76-99: LGTM for provider-aware API key retrieval.

The method correctly maps providers to their environment variables and provides helpful guidance when keys are missing. The Ollama placeholder and structured error messaging are appropriate.


137-143: LGTM for fake provider handling in install flow.

The fake provider correctly bypasses real API key requirements with a sensible default, enabling offline testing without additional configuration.


830-831: LGTM for CLI help documentation.

The epilog correctly documents the new KIMI_API_KEY and CORTEX_PROVIDER environment variables, including all supported provider options.

@Sahilbhatane Sahilbhatane self-assigned this Dec 10, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 791427e and ad1080d.

📒 Files selected for processing (1)
  • requirements-dev.txt (1 hunks)
🔇 Additional comments (1)
requirements-dev.txt (1)

4-5: New dev dependencies appropriately support test infrastructure.

The addition of pytest-mock, requests, PyYAML, and pre-commit aligns well with the PR's integration test suite and multi-provider support (Kimi K2, FAKE). These dependencies enable mocking, HTTP interactions, YAML configuration handling, and pre-commit consistency checks.

Also applies to: 8-8, 25-25

PyYAML>=6.0.0

# Code Quality
black>=23.0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Remove duplicate black entry with conflicting versions.

The file contains two entries for the black package with different version constraints (line 11: >=23.0.0 and line 22: >=24.0.0). This creates ambiguous dependency resolution and will cause issues with package managers.

Apply this diff to remove the duplicate:

  # Code Quality
- black>=23.0.0
  pylint>=2.17.0
  mypy>=1.0.0

Retain line 22's black>=24.0.0 as the single, authoritative version constraint.

Also applies to: 22-22

🤖 Prompt for AI Agents
In requirements-dev.txt around lines 11 to 22, there are duplicate entries for
black with conflicting versions; remove the earlier entry at line 11
(`black>=23.0.0`) so only the authoritative `black>=24.0.0` at line 22 remains,
ensuring no duplicate/conflicting constraints in the file.

@Sahilbhatane Sahilbhatane linked an issue Dec 11, 2025 that may be closed by this pull request
@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Kimi K2 API Integration End-to-end integration test suite

3 participants